将包含文本和不同边缘类型的文本的信息节点连接的异质网络通常用于在各种现实世界应用程序中存储和处理信息。图形神经网络(GNNS)及其双曲线变体提供了一种有希望的方法,可以通过邻域聚集和分层特征提取在低维的潜在空间中编码此类网络。但是,这些方法通常忽略Metapath结构和可用的语义信息。此外,这些方法对训练数据中存在的噪声很敏感。为了解决这些局限性,在本文中,我们提出了富含文本的稀疏双曲图卷积网络(TESH-GCN),以使用语义信号捕获图形的Metapath结构,并进一步改善大型异质图中的预测。在TESH-GCN中,我们提取语义节点信息,该信息连接信号是从稀疏的双曲线图卷积层中从稀疏邻接张量中提取相关节点的局部邻域和图形级Metapath特征。这些提取的功能与语言模型的语义特征(用于鲁棒性)结合使用,用于最终下游任务。各种异质图数据集的实验表明,我们的模型在链接预测任务上的大幅度优于当前最新方法。我们还报告说,与现有的双曲线方法相比,训练时间和模型参数均减少了,通过重新的双曲线图卷积。此外,我们通过在图形结构和文本中使用不同级别的模拟噪声来说明模型的鲁棒性,并通过分析提取的Metapaths来解释Tesh-GCN的预测机制。
translated by 谷歌翻译
提高搜索结果的质量可以显着增强用户的体验和与搜索引擎的交战。尽管机器学习和数据挖掘领域的最新进展,但正确对特定用户搜索查询的项目进行了分类一直是一个长期的挑战,这仍然有很大的改进空间。本文介绍了“购物查询数据集”,这是一个很大的亚马逊搜索查询和结果的大型数据集,以促进研究以提高搜索结果的质量,以促进研究。该数据集包含大约1.3万个独特的查询和260万手动标记(查询,产品)相关性判断。该数据集具有多语言,其中包括英语,日语和西班牙语的查询。购物查询数据集用于KDDCUP'22挑战之一。在本文中,我们描述了数据集并介绍了三个评估任务以及基线结果:(i)对结果列表进行排名,(ii)将产品结果分类为相关性类别,以及(iii)确定给定查询的替代产品。我们预计这些数据将成为产品搜索主题的未来研究的黄金标准。
translated by 谷歌翻译
对象的嵌入,低维矢量表示,在构建现代机器学习系统中至关重要。在工业环境中,通常有一个嵌入式团队训练嵌入模型来解决预期的任务(例如,产品建议)。然后,消费者团队广泛消耗了生产的嵌入,以解决其意外任务(例如,欺诈检测)。但是,随着嵌入模型的更新和重新培训以提高预期任务的性能,新生成的嵌入不再与现有的消费者模型兼容。这意味着嵌入的历史版本永远无法退休,或者所有消费者团队都必须重新训练模型,以使其与最新版本的嵌入式兼容,这两者在实践中都是非常昂贵的。在这里,我们研究了嵌入版本更新及其向后兼容性的问题。我们正式化了嵌入团队继续更新嵌入式版本的目标,而消费者团队不必重新训练他们的模型。我们开发了一种基于向后兼容的嵌入式学习的解决方案,该解决方案允许嵌入模型版本经常更新,同时还允许将最新版本的嵌入式版本快速转换为IT的任何向后兼容的历史版本,以免消费者团队不使用消费者团队。必须重新训练他们的模型。在我们的框架下,我们探索六种方法,并在现实世界中的推荐系统应用程序上系统地评估它们。我们表明,即使在多个模型版本更新之后,我们称为BC-Aligner的最佳方法也可以与现有意外任务保持向后兼容性。同时,BC-Aligner实现了预期的任务性能,类似于仅针对预期任务进行优化的嵌入模型。
translated by 谷歌翻译
图形神经网络(GNN)已成为编码图形结构数据的强大工具。由于其广泛的应用程序,越来越需要开发工具来解释GNN如何做出给定的图形结构数据决定。现有的基于学习的GNN解释方法在培训中是特定于任务的,因此遭受了关键的缺点。具体而言,它们无法为使用单个解释器提供多任务预测模型的解释。在GNN以自我监督的方式训练的情况下,他们也无法提供解释,并且在未来的下游任务中使用了结果表示。为了解决这些局限性,我们提出了一个任务不合时宜的GNN解释器(TAGE),该解释器(Tage)独立于下游模型,并在自学人员的情况下接受了训练,而对下游任务不了解。 Tage可以通过看不见的下游任务来解释GNN嵌入模型,并可以有效解释多任务模型。我们的广泛实验表明,通过使用相同的模型来解释多个下游任务的预测,同时实现了与当前最新的GNN解释方法一样好甚至更好的解释质量,可以显着提高解释效率。我们的代码可公开作为DIG库的一部分,网址为https://github.com/divelab/dig/tree/main/main/dig/xgraph/tage/。
translated by 谷歌翻译
图形神经网络(GNNS)在节点分类,回归和推荐任务中取得了最新的最新性能。当可提供高质量和丰富的连接结构时,GNNS工作好。但是,在许多真实世界图中,该要求在节点度具有幂律分布的许多真实世界中,因为许多节点具有较少或嘈杂的连接。这种情况的极端情况是节点可能没有邻居,称为严格的冷启动(SCS)场景。这会强制预测模型依赖于节点的输入特征。与通过蒸馏方法相比,我们提出冷啤酒以解决SCS和嘈杂的邻居设置。我们介绍了功能贡献比(FCR),测量使用电感GNN解决SCS问题的可行性,并选择SCS泛化的最佳体系结构。我们通过实验显示FCR Disentangles图数据集的各种组成部分的贡献,并展示了几个公共基准和专有电子商务数据集上的冷啤酒的优越性。我们方法的源代码可用于:https://github.com/amazon-research/gnn-tail-一致化。
translated by 谷歌翻译
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.
translated by 谷歌翻译
Text-to-text generation models have increasingly become the go-to solution for a wide variety of sequence labeling tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder's output probabilities is not the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach -- which leverages statistics from top-$k$ predictions by a beam search -- significantly reduces calibration errors of the predictions of a generative sequence labeling model.
translated by 谷歌翻译
We consider the task of text generation in language models with constraints specified in natural language. To this end, we first create a challenging benchmark Cognac that provides as input to the model a topic with example text, along with a constraint on text to be avoided. Unlike prior work, our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata, which allows for straightforward evaluation while striking a balance between broad attribute-level and narrow lexical-level controls. We find that even state-of-the-art language models like GPT-3 fail often on this task, and propose a solution to leverage a language model's own internal knowledge to guide generation. Our method, called CognacGen, first queries the language model to generate guidance terms for a specified topic or constraint, and uses the guidance to modify the model's token generation probabilities. We propose three forms of guidance (binary verifier, top-k tokens, textual example), and employ prefix-tuning approaches to distill the guidance to tackle diverse natural language constraints. Through extensive empirical evaluations, we demonstrate that CognacGen can successfully generalize to unseen instructions and outperform competitive baselines in generating constraint conforming text.
translated by 谷歌翻译
Language models have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a large language model to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance. We find substantial overlap in the set of attention heads (un)important for in-context learning across tasks and number of in-context examples. We also address our hypothesis through a task-agnostic lens, finding that a small set of attention heads in OPT-66B score highly on their ability to perform primitive induction operations associated with in-context learning, namely, prefix matching and copying. These induction heads overlap with task-specific important heads, suggesting that induction heads are among the heads capable of more sophisticated behaviors associated with in-context learning. Overall, our study provides several insights that indicate large language models may be under-trained to perform in-context learning and opens up questions on how to pre-train language models to more effectively perform in-context learning.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译